Florian Wilhelm - Extending Scikit-Learn with your own Regressor
[25 July 2014]
We show how to write your own robust linear estimator within the Scikit-Learn framework using as an example the Theil-Sen estimator known as "the most popular nonparametric technique for estimating a linear trend".
Scikit-Learn (http://scikit-learn.org/) is a well-known and popular framework for machine learning that is used by Data Scientists all over the world. We show in a practical way how you can add your own estimator following the interfaces of Scikit-Learn. First we give a small introduction to the design of Scikit-Learn and its inner workings. Then we show how easily Scikit-Learn can be extended by creating an own estimator. In order to demonstrate this, we extend Scikit-Learn by the popular and robust Theil-Sen Estimator (http://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator) that is currently not in Scikit-Learn. We also motivate this estimator by outlining some of its superior properties compared to the ordinary least squares method (LinearRegression in Scikit-Learn).